πΊ OK, not really!
This is actually about tidyr::pivot_*() functions, which you can learn all about in the new tidyr version 1.0.0 Pivoting vignette!
But, I will be using some nifty TV-related data. So, Iβm sticking with the name.
library(tidyverse)
sheet <- googlesheets::gs_title("bobs_burgers_survey_results")
bobs_ws <- googlesheets::gs_ws_ls(sheet)
raw_dat <- sheet %>%
googlesheets::gs_read(ws = glue::glue("{bobs_ws}"))
## Accessing worksheet titled 'Form Responses 1'.
## Parsed with column specification:
## cols(
## Timestamp = col_character(),
## `Members of the Belcher family with whom I identify (select all that apply)` = col_character()
## )
belcher_results <- tibble::rowid_to_column(raw_dat, "resp_id") %>%
dplyr::rename("response" = `Members of the Belcher family with whom I identify (select all that apply)`) %>%
dplyr::select(-Timestamp)
Because I used a Google Form to collect this data, I donβt have to worry about order of names, since they come out the same every time.
agg_results <- belcher_results %>%
dplyr::group_by(response) %>%
dplyr::summarise(total = n()) %>%
dplyr::arrange(desc(total))
agg_results
## # A tibble: 30 x 2
## response total
## <chr> <int>
## 1 Bob 49
## 2 Tina 29
## 3 Louise 23
## 4 Bob, Tina 22
## 5 Bob, Louise 17
## 6 Bob, Tina, Louise 17
## 7 Tina, Louise 12
## 8 Bob, Linda, Tina, Gene, Louise 11
## 9 Bob, Tina, Gene 10
## 10 Bob, Gene 8
## # β¦ with 20 more rows
tidyr::separate_rows()Letβs make them long with tidyr::separate_rows(). Iβm also adding a numeric variable, identify (short for βcharacter(s) with whom I identifyβ), and ensuring that respondent IDs donβt get erroneously treated as numeric, by converting them to characters.
belcher_results <- belcher_results %>%
tidyr::separate_rows(response) %>%
dplyr::mutate(identify = 1,
resp_id = as.character(resp_id))
head(belcher_results)
## # A tibble: 6 x 3
## resp_id response identify
## <chr> <chr> <dbl>
## 1 1 Linda 1
## 2 1 Tina 1
## 3 1 Louise 1
## 4 2 Bob 1
## 5 2 Gene 1
## 6 3 Bob 1
tidyr::pivot_wider()Now weβll take one of the βnewβ tidyr verbs for a spin, pivot_wider(). Rather than fill things out with a bunch of NAs, weβll prepare our data to use with the UpSetR package by turning it into binaries, and ditch the respondent ID in the end.
binary_tib <- belcher_results %>%
tidyr::pivot_wider(
names_from = response,
values_from = identify,
values_fill = list(identify = 0)
) %>%
dplyr::select(-resp_id)
head(binary_tib)
## # A tibble: 6 x 5
## Linda Tina Louise Bob Gene
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 0 0
## 2 0 0 0 1 1
## 3 0 1 0 1 1
## 4 1 1 1 1 1
## 5 0 0 0 1 0
## 6 0 0 1 0 0
UpSetR::upset()I highly recommend Paul Campbellβs code-through using UpSetR, which gave me (among other things) the pro tip that upset() does not like tibbles (hence the as.data.frame() at the end).
binary_df <- as.data.frame(binary_tib)
UpSetR::upset(binary_df, nsets = 5, order.by = "freq")
In essence, our desired output above (the UpSet plot), dictated the format of our data. If we wanted to use a Venn Diagram (for example, using the VennDiagram) package, weβd want our data in yet another format.
Weβll use out long data frame from before, belcher_results. What we want is a set of respondents who identified with each character. For example, if I wanted just the respondents who chose Bob, I would do the following:
bob <- belcher_results %>%
filter(response == "Bob") %>%
pull(resp_id)
Iβll make a little helper function, and do the same for the rest of the family. (Yes, this could be refactored to be much more efficient, but the names of the members of the Belcher Family roll of my fingertips easily enough).
# little brittle helper
make_set <- function(x) {
belcher_results %>%
filter(response == x) %>%
pull(resp_id)
}
linda <- make_set("Linda")
tina <- make_set("Tina")
gene <- make_set("Gene")
louise <- make_set("Louise")
Now Iβll follow this handy tutorial from the R Graph Gallery, and turn this into a basic Venn Diagram
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
venn.diagram(
x = list(bob, linda, tina, gene, louise),
category.names = c("Bob", "Linda", "Tina", "Gene", "Louise"),
filename = "belcher_venn.png"
)
## [1] 1